decision problem
Iterative Value-Aware Model Learning
This paper introduces a model-based reinforcement learning (MBRL) framework that incorporates the underlying decision problem in learning the transition model of the environment. This is in contrast with conventional approaches to MBRL that learn the model of the environment, for example by finding the maximum likelihood estimate, without taking into account the decision problem. Value-Aware Model Learning (VAML) framework argues that this might not be a good idea, especially if the true model of the environment does not belong to the model class from which we are estimating the model. The original VAML framework, however, may result in an optimization problem that is difficult to solve. This paper introduces a new MBRL class of algorithms, called Iterative VAML, that benefits from the structure of how the planning is performed (i.e., through approximate value iteration) to devise a simpler optimization problem. The paper theoretically analyzes Iterative VAML and provides finite sample error upper bound guarantee for it.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- (3 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Hardware (0.67)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Barbados (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Wisconsin (0.04)
- Europe > Portugal > Porto > Porto (0.04)
Appendix AProofs A.1 ProofofLemma3.1
A.4 ProofofLemma4.5 The proof of Lemma 4.5 depends on another lemma, which will also be useful in the unknown hypothesis class setting. Then there exists somec for whichhc(x) / {0,1}, which is impossible. If qH(x) = c, we don't need to prove anything. Let h H denote the decision maker's decision rule. From Corollary 4.6, we know that h(x)=1 {qH(x) c}aslongasqH(x)6=c.
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Czechia > Prague (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Reciprocal Learning
These instances range from active learning over multi-armed bandits to self-training. We show that all these algorithms not only learn parameters from data but also vice versa: They iteratively alter training data in a way that depends on the current model fit. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. This allows us to study under what conditions they converge.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (5 more...)